ADtrees for Fast Counting and for Fast Learning of Association Rules
نویسندگان
چکیده
The problem of discovering association rules in large databases has received considerable research attention. Much research has examined the exhaustive discovery of all association rules involving positive binary literals (e.g. Agrawal et al. 1996). Other research has concerned finding complex association rules for high-arity attributes such as CN2 (Clark and Niblett 1989). Complex association rules are capable of representing concepts such as "PurchasedChips=True and PurchasedSoda=False and Area=NorthEast and CustomerType=Occasional ⇒ AgeRange=Young", but their generality comes with severe computational penalties (intractable numbers of preconditions can have large support). Here, we introduce new algorithms by which a sparse data structure called the ADtree, introduced in (Moore and Lee 1997), can accelerate the finding of complex association rules from large datasets. The ADtree uses the algebra of probability tables to cache a dataset’s sufficient statistics within a tractable amount of memory. We first introduce a new ADtree algorithm for quickly counting the number of records that match a precondition. We then show how this can be used in accelerating exhaustive search for rules, and for accelerating CN2-type algorithms. Results are presented on a variety of datasets involving many records and attributes.
منابع مشابه
Machine Learning with Large Datasets
This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of...
متن کاملSu cient Statistics for E cient Machine Learning with Large Datasets
This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of...
متن کاملManagement of thyroid diseases and steroid replacement in Ramadan: A review study
Most Muslims fast during the holy month of Ramadan. Patients with thyroid diseases do not normally need medication adjustment and are able to fast safely. On the other hand, hypothyroid patients are prescribed with thyroxine tablets, which should be taken on an empty stomach at bedtime or half an hour before Sohur. Hyperthyroid patients receiving treatment with methimazole or carbimazole have t...
متن کاملThe Association between Socio-Demographic Charactristics and Fast Food Consumption withinHigh School Students in Isfahan, Iran
Abstract Introduction: Fast food consumption has greatly increased with in adolescents in recent years, which is linked with weight gain, poor dietary indexes and insulin resistance. Hence, the purpose of this study was to examine the association between demographic characteristics and fast food consumption with in high school students. Materials & Methods: In this descriptive-analytic st...
متن کاملA Margin-based Model with a Fast Local Searchnewline for Rule Weighting and Reduction in Fuzzynewline Rule-based Classification Systems
Fuzzy Rule-Based Classification Systems (FRBCS) are highly investigated by researchers due to their noise-stability and interpretability. Unfortunately, generating a rule-base which is sufficiently both accurate and interpretable, is a hard process. Rule weighting is one of the approaches to improve the accuracy of a pre-generated rule-base without modifying the original rules. Most of the pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998